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Abstract 

In this paper the problem of retrospective change-point detection and esti- 
mation in multivariate linear models is considered. The lower bounds for the 
error of change-point estimation are proved in different cases (one change-point: 
deterministic and stochastic predictors, multiple change-points). A new method 
for retrospective change-point detection and estimation is proposed and its main 
performance characteristics (type 1 and type 2 errors, the error of estimation) are 
studied for dependent observations in situations of deterministic and stochastic 
predictors and unknown change-points. We prove that this method is asymptot- 
ically optimal by the order of convergence of change-point estimates to their true 
values as the sample size tends to infinity. Results of a simulation study of the 
main performance characteristics of proposed method in comparison with other 
well known methods of retrospective change-point detection and estimation are 
presented. 

Keywords: change-point; retrospective detection and estimation; performance 
measure; asymptotic optimality 
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1 Introduction 



This paper deals with change-point problems for multivariate linear models. We begin 
with a short review of this field. 

The change-point problem for regression models was first considered by Quandt 
(1958, 1960). Using econometric examples Quandt proposed a method for estimation 
of a change-point in a sequence of independent observations based upon the likelihood 
ratio test. 

Let us describe the change-point problem for the linear regression models considered 
in the literature. Let yi,y2, ■ ■ ■ lUn be independent random variables (i.r.v.'s). Under 
the null hypothesis Hq the linear model is 

Vi = x*f3 + ei, 1 <i <n, 

where /3 = /32, • • • , (^d)* is an unknown vector of coefficients, x* = {l,X2i, 
. . . ,Xdi) are known predictors (here and below * is the transposition symbol). 

The errors are supposed to be independent identically distributed random vari- 
ables (i.i.d.r.v.'s) with Eej = 0, < = var ei < oo. 

Under the alternative hypothesis Hi a change at the instant k* occurs, i.e. 



x*/3 + ei, l<t<k* 
x*7 + ei, k* <i < n, 



where k* and 7 G M*^ are unknown parameters, and /3 7^ 7. 
Denote ^ ^ 

y^ = k^ ^k^ 

i<«<fc i<«<fe 

Qn ~ (^i Xn)(^i ^n) 

l<j<n 

and X„ = (xi, X2, . . . , x„)*, = (yi, 1/2, • • • , Vn)*- 
The least square estimate of /3 is: 

/3n = (X* x„) ^ X* y^. 

Siegmund with co-authours (James, James, Siegmund (1989)) proposed to reject 
Hq for the large values of max |[/„(/c)|, where 

l<k<n 

jj _ ( ^_^__\l/2 Vk — Vn — /^n(Xk — X„)* 



l-k/n' (1 - k{xk - a;„)(xfc - a;„)V(g„(l - k/n))yl'^ ' 
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Earlier, Brown, Durbin, and Evans (1975) used the cumulative sums of regression 
residuals 

{Vi -Vn- /3n(xi - x„)*), l<k<n. 

l<i<k 

It is easy to see that 

^"(^) = (h^T^IT)'^' E (Z/.-yn-/3„(x,-X„)*) 
^ ' l<i<k 

Wn{k) = 1 - k{5tk - x„)(xfc - x„)7(Qn(l " k/n)))-'/\ 

The functionals of Un{k) and Rn{k) were used as the test statistics for detection of 
change-points in regression relashionships. 

Kim and Siegmund (1989) obtained the limit distribution of max \Un{k)\. Alterna- 
tively, Maronna and Yohay (1978), and Worsley (1986) used the maximum likelihood 
method for testing Hq against Hi for Gaussian errors. Later Gombay and Horvath 
(1994) studied the limit distributions of statistics Zn{i,j) = max \Un{k)\, Tn{i,j) = 

i<k<j 

max \Rn{k)\ for deterministic and stochastic regression plans. The monograph by 

i<fc<jr 

Csorgo and Horvath (1997) puts together various results in detection of structural 
changes in regression models. 

Besides change-point detection problems, results in change-point estimation for 
regressions are of especial practical importance. This theme is considered in papers by 
Darkhovsky (1995), Huskova (1996), Horvath, Huskova, and Serbinovska (1997). In 
two last papers the asymptotical characteristics of change-point estimates based upon 
the maximum likelihood statistics are studied. For the case of contiguous alternatives, 
the limit distribution of the change-point estimates is obtained and weak and strong 
consistency of these estimates is proved. The paper by Darkhovsky (1995) develops 
the nonparametric approach to retrospective change-point estimation. Here the limit 
characteristics of change-point estimates in the functional regression model are studied 
without the contiguity assumption, and the rate of convergence of these estimates to 
the 'true' change-point parameters is estimated. Some generalizations of these results 
can be found in the monograph by Brodsky and Darkhovsky (2000). 

A new wave of research interest to change-point problems in regressions was formed 
in 2000s. Different generalizations to change-point problems for autoregressive time se- 
ries (Huskova, Praskova, Steinebach (2007, 2008), Gombay (2008)), for multiple change- 
point estimation in non-stationary time series (Davis, Lee, Rodriguez- Yam (2006)), for 
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testing change-points in covariance structure of linear processes (Berkes, Gombay, Hor- 
vath (2009)) were studied. 

However, as a result we see the multitude of methods proposed for solving different 
change-point problems in linear relationships and almost no theoretical approaches to 
their comparative analysis. We cannot even estimate the asymptotic efficiency of these 
methods. All that is empirically observed for 'structural breaks' tests in statistics and 
econometrics can be reduced to the following 'vague' statement: the power of these 
methods is rather low. Let us agree that this 'practical conclusion' requires a more 
serious verification. 

In this paper, we pursue the following main goals: 

1) To prove the prior theoretical lower bounds for the error probability in change- 
point estimation in multivariate models. These bounds provide the theoretical basis 
for the proofs of the asymptotic optimality of change-point estimates and for the com- 
parative analysis of these estimates; 

2) To propose a new nonparametric method for the problem of retrospective change- 
point detection and estimation in multivariate linear systems. Then we study the main 
performance characteristics of this method: type 1 and type 2 errors, the error of 
change-point estimation. 

3) For the problem of multiple change-point detection and estimation, to propose a 
general statement in which both the number of change-points and their coordinates in 
the sample are unknown. For this problem statement, to propose a new asymptotically 
optimal method which gives consistent estimates of an unknown number of change- 
points and their coordinates. 

The structure of this paper is as follows. In Section 2 the general change-point 
problem for multivariate linear systems is formulated and general assumptions are 
given. In Section 3 we prove the prior informational inequalities for the main perfor- 
mance characteristic of the retrospective change-point problem, namely, the error of 
change-point estimation. The lower bounds for the error of estimation are found in 
different situations of change-point detection (deterministic and stochastic regression 
plan, multiple change-points). In Section 4 we propose a new method for the retrospec- 
tive change-point detection and estimation in multivariate linear models and study its 
main performance characteristics (type 1 and type 2 errors, the error of estimation) in 
different situations of change-point detection and estimation (dependent observations, 
deterministic and stochastic regression plan, multiple change-points). We prove that 
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this method is asymptotically optimal by the order of convergence of change-point esti- 
mates to their true values as the sample size tends to infinity. In Section 5 a variant of 
the functional limit theorem in the case of absence of change-points is given. In Section 
6 a simulation study of characteristics of the proposed method for finite sample sizes is 
performed. The main goals of this study are as follows: to compare performance char- 
acteristics of the proposed method with characteristics of other well known methods 
of change-point detection in linear regression models, to consider more general multi- 
variate linear models and performance characteristics of the proposed method in these 
multivariate models. Section 7 contains main conclusions. All proofs are given in the 
Appendix. 

2 Problem statement and general assumptions 
2.1 General model 

The following basic specification of the multivariate system with structural changes is 
considered: 

Y{n) = UX{n) + Un, n = l,...,N (1) 

where Y(n) = {yin, • • • , Vmu)* is the vector of endogenous variables, X(n) = {xin, ■ ■ ■ , Xxn)* 
is the vector of pre-determined variables, U is M x K matrix, z/^ = . . . , Vun)* is 
the vector of random errors. 

The matrix 11 = n('!?,?7,), § = (6*1, .. . ,6k) can change abruptly at some unknown 
change-points = [9iN], i = 1, . . . ,k (here and below [a] denote the integer part of 
number a), i.e., 

fc+i 

i=l 

where 6i are unknown change-point parameters such that = < < • • • ^fc < 
6k+i = 1, aj 7^ cij+i, i = 1, . . . ,k are unknown matrices (here and below I{A) is the 
indicator of the set A). 

The problem is to estimate the unknown parameters 6i (and therefore, the change- 
points rrii) by observations Y(z),X(z),z = 1,...,A^ (the case 6i = l,i = l,...,k 
corresponds to the model without change-points). 

Therefore, first, we need to test an obtained dataset of observations for the presence 
of change-points. Second, in the case of a rejected stationarity hypothesis, we wish to 



5 



estimate all detected change-points. 

Model (1) generalizes many widely used regression models, namely: 
a) autoregression model (AR) 

yn = Co + CiUn-l H h CraUn-m + l^n, 

Here X(ra) = (1, y^-i, • • • , Vn-m)*, n = (cq, Ci, . . . , c^). 

h)autorgression-moving average (ARMA) model 

Un = CiUn-l H h CkHn-k + diU^-A H \- d 

where Un is the input variable, is the output variable at the instant n, A is the delay 
time. Here X(n) = (y„_i, . . . , Vn-m, Un-A, ^n-A-m)*, n = (ci, . . . , c^, di, . . . , ci^). 

c) multi- factor regression model 

r li 

Un = CiUn-l H h CkVn-m + X] X] ^^J^ii'^ " j) + 

where r,m,/i > 1. Here X(ra) = ...,?/„_„, - 1), - /i), X2(n - 

1), . . . , X2(r2 - ^2), • • • , a;r-(?^ - 1), • • • , Xr{n - Ir))* , 11 = (ci , . . . , Cfc, (in, 
• • • , drl^). 

d) simultaneous equation systems (SES) 

BY{n) + rX(n) = e„, 

where Y{n) = (yi„, y2n, ■ ■ ■ , VMn)* is the vector of endogenous variables, X(r2) = 
{xin,X2n, ■ ■ ■ ,XKn)* IS the vcctor of prc-dctermined variables (all exogenous variables 
plus lagged endogenous variables), e„ = {ein,€2n, ■ ■ ■ ,^Mn)* is the vector of random 
errors, B is a M x M non-degenerate matrix (det 0),r is aMxK matrix. 

This general structural form of the SES can be written in the following reduced 
form: 

Y{n) = -B-^ rX(n) + B-^^ = nX(n) + z/„ 

This system is usually used for the analysis of change-points (structural changes) 
in multivariate linear models (see, e.g., Bai, Lumsdaine, Stock (1998)). 

2.2 General assumptions 

In this subsection we formulate general assumptions which will be used in our main 
theorems 3-5. Some specific assumptions will be formulated together with the corre- 
sponding theorems. 
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Let us start from the following definitions. Consider the probability space {Q, ^, P). 
Let T-Li and be two a-algebras from ^. Consider the following measure of dependence 
between "Hi and ■ 

P{AB) 



i/j{Hi,H2) = sup 

Ae'Hi,Be'H2,P{A)P(B)^0 



P{A)P{B) 
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Suppose > 1) is a sequence of random vectors defined on (fi,^, P). Denote 

by 5l = a{Xi :s<i<t},l<s<t<oo the minimal a-algebra generated by random 
vectors Xi, s < i < t. Define 



t>i 



A) Mixing condition 

We say that scalar random sequence {xn} satisfies the ip-mixing condition if the 
function tpln) (which is also called the ip-mixing coefficient) tends to zero as n goes to 
infinity. 

We say that vector random sequence {X{n)}, X{n) = {xi{n), . . . ,Xk{n))* satisfies 

the uniform ip-mixing condition if maxipij{n) tends to zero as n goes to infinity, where 

hi 

ipij{n) is the ?/^-mixing coefficient for the sequence {xi{n)xj{n)}. 

The ^-mixing condition is satisfied in most practical situations of change-point 
detection. In particular, for a Markov chain (not necessarily stationary), if ipln) < 1 
for a certain n, then tjj{k) goes to zero at least exponentially as A; — oo (see Bradley, 
2005, theorem 3.3). 

B) Cramer condition 

Let {C('^)}7 Ci^) = (Ci(n), . . . Xki^))* be a vector random sequence. We say that 
the uniform Cramer condition is satisfied if there exists a constant L > such that 

sup El exp {t(i{n)(j{n)) < oo 

n 

for every i, j = 1, . . . , k and |t| < L. 

For a centered random sequence this condition is equivalent to the following: 
there exist constants g > 0, T > such that for each |t| < T: 



sup Ee*^" < exp . 
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3 Preliminary results: prior inequalities 



3.1 Unique change-point 

On a probability space {^l,J^, Pe) consider a sequence of i.r.v.'s Xi, . . . ,Xjv with the 
following density function (w.r.t. some a-finite measure /i) 

^ f Mxn,n/N), l<n<[eN], 
1 /i(x„,n/iV), [eN]<n<N. 

Here < ^ < 1 is an unknown change-point parameter. 
Define the following objects: 

r^(A) : M~ — ^ A C (3) 

is the Borel function on with the values in the set A; 

A^^(A) = {T,(A)} (4) 

is the collection of all Borel functions Tjy. 

Theorem 1. Suppose the following assumption is satisfied: 

the functions Joit) =^ Eq In '{'^ ^ ' | and Ji(t) = Ei In y-j— ^ are continuous at 

fliX^t) fQ[X,t) 

[0, 1] and such that 

Jo{t) >5>0, Ji{t) >5>0. 
Then for any fixed 0<6<1,0<€<6A{1 — 6) the following inequality holds: 

(e+e 6 
[ Jo{t)dt, I Ji{t)dt 
J J 
e e-e 

The proof of this theorem is given in the Appendix A. 

Remark 1. The lower hound in Theorem 1 can not he improved essentially. It follows 
from the results of Korostelev (1997). In this work the exact lower hound for the change- 
point estimate in continuous time model for the Wiener process was given. The exact 
lower hound in Korostelev (1997) differs from our hound only hy a constant factor. 



Consider the following particular cases of model (2). 
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1. A break in the trend function (f)[t) of the mathematical expectation of Gaussian 
observations 

Let 

fo{x,t) = h{x)exp{Mt)x-(pl{t)/2), t<9 
fi{x,t) = /i(x)exp(0i(t)x-02(t)/2), t>e, 

where h{x) = exp(— 0o(") 7^ 

V 27r 

In this case from Theorem 1 we obtain the following lower bound for the error 
probability: 

P4|^jv-^|>e}>(l-o(l))- 

(e+e e 
min ( j (Mt) - Mt)fdt, J (0o(t) - Mt)? dt) 
e e-e 

2. Linear regression with deterministic predictors and Gaussian errors 
Let 

Vn = ci(n)xi„ H h Ck{n)xkn + n = 1, . . . , iV, (5) 

where {S,n} is a sequence of independent Gaussian r.v.'s with zero mean, ~ A/'(0, o"^), 
c{n) = (ci(n),...,CfcH)* = al(n < [ON]) + hl{n > [ON]), a = (ai,...,a,)* ^ 
b = ih, hk)\ Xin = f^{n/N), n = 1, . . . , iV, and /,(■) E C[0, 1], t = 1, . . . , k. 

In this case from Theorem 1 applied to the sequence of observations yi, . . . ,yN we 
obtain: 

P,{|^A.-^|>e}>(l-o(l))- 

/ k ^ k 

■exp min ( j /,(t)(a, - h,)fdt, j f\{t){a, - h,)fdt) 



3. Linear stochastic regression model with Gaussian predictors 
Consider model (5) with = 0. Suppose that there exist continuous functions 
/j(-), crj(-), -i = 1, . . . , A; such that are Gaussian i. r.v.'s, Xin ~ M {fi{n/N), af{n/N)) , 
n = 1, . . . , A^. Suppose also that Xin and independent for i ^ j and c(?t,) is the 

same as in model (5). 

Then from Theorem 1 we obtain: 

/ e+e e \ 

Pe{\ON -0\>e}>{l- 0(1)) exp I -^min ( j Jo{t)dt, j J^{t)dt) , 
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where 



21nM) + ri + ^^ (Ml 



- 1 



and 



Mt) = difiit) + ■■■ + akfkit), Alit) = ajafit) + ■■■ + alal{t), 
Mt) = hh{t) + ■■■ + hm, Alit) = bjafit) + ■■■ + blalit). 

3.2 Multiple change-points 

Theorem 1 can be generahzed to the case of several change-points in the sequence of 
independent r.v.'s with the following density function: 

f{xn) = Mxn, n/N) I([^,_iiV] <n< [O^N]), n = 1, . . . , iV, 



where i = 1, . . . , A; + 1 and = 6q < 6i < ■ ■ ■ < 6k < 6k+i = 1- 
Suppose the following assumptions are satisfied: 

i) change-points 9i are such that min (9i — 9i-i) > 6 > 0. 
' i<i<fc+i 

ii) the functions JAt) = E,, In and J'-'(t) = E,_i In ^':]^^'^\ i = l,...,k 
are continuous at [0, 1] and such that 

Ji{t) > A>0,t = l,...,k 

For the multiple change-point problem we estimate both the number k and the 
vector ^ =^ {6i, . . . ,dk) of change-points' coordinates. Let s* '= [l/S] and denote 

g = {i,2,...,s*}. 

For any s & Q define 

Vs = {x e W : 6 < Xi < 1 - 6, Xi+i - Xi> 6,xo = 0, Xs+i = 1} 

(o) 

By the construction, an unknown vector is an arbitrary point of the set Vk and an 
unknown number of the change-points k is an arbitrary point of the set Q. 

As before, it is reasonable to consider objects (3)- (4). In this notation A^^(P*) 
is the set of all arbitrary estimates of the parameter and Aif^{Q) is the set of all 
arbitrary estimates of the parameter k on the basis of observations with the sample 
size A^. 
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Let k G Aif^iQ) is an estimate of an unknown number of change-points k and 
'& G A^jv(^fc) is an estimate of unknown change-point coordinates on condition that 
the number of the coordinates was estimated correctly. 

Theorem 2. Suppose assumptions i) and ii) are satisfied. Then for any fixed < e < 5 
the following inequality holds: 

hminf AT^oo In inf inf sup sup P^dA; 7^ fc} U {(A; = fc)n 

ei+e e, 
n(max \6i - 6i\ > e)} > - min min( / J''~^{T)dT, J Ji{T)dT). 

l<i<k l<i<k a a 

— — — — Oi tJi—t 

The proof of this theorem is given in the Appendix B. 



4 Main results 

Now consider model (1). In this Section we assume that the uniform mixing condition 
(A) and the uniform Cramer condition (B) (see Section 2) are satisfied, and an unknown 
vector of change-point parameters d = (6*1, . . . ,6'^) is such that < P < 61 < 62 < 
■ ■ ■ < ^fc ^ < 1) where /3, a are known numbers. Everywhere below the measure 
corresponds to a sample with the change-point (Pq corresponds to a sample without 
change-points). 

4.1 Unique change- point 

In this subsection model (1) with unique change-point 0</3<6'<Q;<lis considered. 
4.1.1 Deterministic predictors 

Let us formulate assumptions for model (1) in the case of a unique change-point (remind 
that in model (1) the vector X(?t,) has the dimension K and the vector Y{n) has the 
dimension M): 

a) the vector random sequence {z/„} satisfies conditions (A) and (B) (see section 2). 

b) there exist functions /j(-) G C[0, 1], i = 1, . . . , such that Xin = fi{n/N),n = 
1,...,N. 

Denote F(t) = (/i(t), . . . , /^(t))* , t G [0, 1]. 

c) for arbitrary < ti < t2 < 1, the matrix 

^(^1,^2)= r F{s)F*{s)ds 
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is positive definite (below we denote A{t) A{0,t), A{1) =^/). 

In virtue of our assumptions, tlie matrix / is symmetric and positive definite. 
Define K x M matrix 

n2 

Z{m,n2) = J^ F{i/N)Y*{{} 



and K x K matrix 



n2 



't^ J2 F{k/N)F*{k/N), 1 < m < ri2 < iV. 

k=ni 

The following matrix statistic is used for estimation of an unknown change-point: 
Z^(n) = {Zil,n)-VnV^)-'Z{l,N)) . (7) 
An arbitrary point n of the set arg max ll^jvf'^)!!^ is assumed to be the esti- 

[l3N]<n<[aN] 

mate of an unknown change-point (here and below ||C|| denotes the Gilbert norm of a 



quadratic matrix C, namely ||C|| = a/ tr{CC*)). 

We define also the value 6n = n/N - the estimate of the change-point parameter 6. 
Denote B = B{e) = {E - I~^A{e)) (a - b)*. 

Theorem 3. Suppose assumptions a)-c) are satisfied and rank{B) = M if 9 E [/3,a;]. 

Then the estimate Ojsi converges to the change-point parameter 6 Vq- almost surely 
as N ^ oo. 

Besides, for any fixed (a — /3) > e > the following inequality is satisfied for 
N > No{F): 



^ Nf3(c{e,N)/n^^^ 



exp 



v 



Agmo {C{e,N)/n) 



sup PeilOj, -e\>e}<mo {C{e, N)/n) { 

I3<e<a 



ifC{e,N)<ngT 

TN^(c{e,N)/n' 



(8) 



exp 



4mo (C(e,iV)/7^) 
zfC{e,N)>ngT. 

where the constants g, T, mo(-) > 1 are taken from the uniform Cramer's and ip- 
mixing conditions, respectively, C{e,N) = ||a — b||^ — L^/A^ , Nq{F), Xp, Lp^ TZ 
are constants which can be exactly calculated for any given family of functions F{t), 
and the constant Ai is given in the proof. 
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Remark 2. The assumption rankB = M yields K > M, i.e., the number M of 
endogenous variables in (1) cannot exceed the number K of pre- determined variables. 
Note that for one regression equation this assumption is always satisfied. 

Remark 3. For independent random errors rriQ^e) = 1. 

Remark 4. Comparing theorems 1 and 3, we conclude that the order of convergence of 
the proposed estimate of the change-point parameter to its true value is asymptotically 
optimal as N oo. 

Remark 5. For any given family of functions F{t) one can calculate the function 
f{t) = ||m(t)|p, m{t) = lim EigZ,^{[Nt]) (see the proof) and investigate this function 

N^oo 

on the square {6, t) G a] x a] . Such investigation gives the opportunity to calculate 
all constants from the formulation. 

The proof of Theorem 3 is given in the Appendix C. 
From the proof we obtain the following 

def Lp 

Corollary 1. Let C > be the decision threshold and C = C — — . Then: 



- for type 1 error the following inequality is satisfied: 



exp 



TiVC/3 \ 



Po{ max \\ZJn)f > C} < mo iC/TZ) I 

[l3N]<n<[aN] 



47^mo (C/7^) J ' 
z/C > TZgT 

if C< TZgT, 



(9) 



for type 2 error the following inequality is satisfied: 



exp 



TN(5d \ ^^^^ 
where d = 7^-l ^||m(^)|| - C- > 0, ||m(^)||2 = tr{B* A^{e)B) . 
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4.1.2 Stochastic predictors 

In this subsection we suppose that predictors Xji in (1) are random. On the probability 
space (fijj-", Pe) consider filtration n = l,...,n, where {Tn} ^ J^, J^n can be 

interpreted as all available information up to the instant n. 

Put X(n) = {Xin, XKnT- 

Suppose that the following conditions are satisfied: 

a) there exists a continuous symmetric matrix function V{t),t G [0,1] such that 
the matrix JV{s)ds is positive definite for any < ti < t2 < 1, and E0X(?7,)X*(n) = 
Vin/N); 

b) the sequence of random vectors {(X(ri), z/„)} satisfies the uniform Cramer's and 
'^/'-mixing conditions; 

c) the random sequence {i^n} is a martingale-difference sequence w.r.t. the filtration 

d) the vector of predictors X(?7,) =^ . . . , Xku)* is J-'n_i-measurable. 
On the segment [0, 1] define the K x M matrix process 

m 

n,(t)'='5^X(z)Y*(z), 

i=l 

and the K X K matrix process 

m 

r^(t) = ^X(A;)X*(A:). 

k=l 

In virtue of conditions a), b), c), the matrix process N^^TM{t) weakly converges 
(in the Skorokhod space) to a positive definite symmetric matrix function M(t) =^ 

/ V{s)ds^ and the rate of convergence is exponential. Below we denote M(l) = M. 
Jo 

m 

Analogously, due to conditions a)-d), the matrix process N ^ 'Y^ l^{k)v* [k) weakly 

k=l 

converges to zero with the exponential rate. Both conclusions follow from the fact that 
the random processes 

_i 

n=l 

[Nt] 

N ^ i^inl^n), i,j = l,...,k 

n=l 

weakly converge to zero (as — t- oo) with the exponential rate (see Brodsky, Dark- 
hovsky (2000)). 
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For estimation of an unknown change-point, the following statistic is used: 

Z^(n) = N~' (u^{n/N) - T^{n/N){T^{1))-^ u^) , n = 1, 2, . . . , iV. (10) 
An arbitrary point n of the set Arg max ||Z»,(n)||^ is assumed to be the esti- 

mate of an unknown change-point. Again we define Of, = n/N as the estimate of the 
change-point parameter 6. 

Statistic (10) generalizes statistic (7) to the situation of stochastic predictors. As- 
sumptions a)-d) guarantee the analogous properties of this statistic. In particular, the 
limit value (as — oo) of the mathematical expectation of the statistic Zj^{[Nt]) 
attains its unique global maximum on the segment [0, 1] at the point t* = 9. 

Assumptions a)-d) guarantee convergence in probability of an arbitrary point of 
Arg max ||Zjv(ri)|P to the point 9 with the exponential rate. Hence the P^-a.s. 

[l3N]<n<[aN] 

convergence of the proposed estimate to 9 follows. 

Theorem 4. Suppose that the conditions a)-d) are satisfied and rank(]B) = M if 9 & 



a], where 



def 



E 



-iM(^) 



(a-b)*. 

Then the estimate 9^ of the change-point parameter 9 converges to 9 P^-a.s. as 
N ^ oo. 

Besides, there exists the number Ni = Ni{{X.{n)}) such that for N > Ni and any 
fixed e, (min {{a — (3), ||R||/2) > e > 0), the following inequality holds: 



sup Pe{\0j,-9\ >e}<5^{e) + 

I3<e<a 



mo (C(e,A^)/R)<^ 



exp 



exp 



Nl3(C{e,N)/R 
'Agmo (C(e,Ar)/R 

TNl3(c{e,N)/R 



4mo (C(e,A^)/R 




z/C(e,A^) <R^r 



ifC{e,N)>KgT, 



where C(e, A^) 



reAv 
4M 



max ||M(t)||, the constants g,T,mQ{-) are 

fi<t<a 



taken from the uniform Cramer's and ifj-mixing conditions, and M(t), Av, Ly, 5jv, R 
are described in the proof. 

In particular, for independent observations mo(-) = 1. 

Comparing Theorems 1 and 3, we conclude that the order of convergence of the 
proposed estimate of the change-point parameter to its true value is asymptotically 
optimal as — 7- oo. 
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The proof of Theorem 4 is given in the Appendix D. 
From the proof we obtain the following 

def Ly 

Corollary 2. Let S > be the decision threshold and E> = S . Then: 



for type 1 error the following inequality is satisfied: 



Pol max ||Z^(n)f > 5} < 5^(S) +mo(S/R) < 

[l3N]<n<[aN] 



exp 



Tmi3 



exp 



4Rmo (§/R) 



4R2^mo (§/R) 
<K9T. 



- for type 2 error the following inequality holds: 



Pe{ max \\Z^{n)f < S} < (5^(§) + mo(r) { 

ll3N]<n<laN] 



exp 



TN(3r 



4i?mo(r) 
r > RgT 

/ Nf3r^ \ 
""""P \AR^gmoid)) 
r < RgT, 



where r = R-^ (||Af(^)|| - S- Ly) > 0; \\M{e)f = tr(B*M2(e)I 



4.2 Multiple change-points 

The proposed method can be generalized to problems of detection and estimation 
of multiple change-points in regression models. A widespread approach to solving 
these problems (see, e.g., Bai, Lumsdaine, Stock (1998)) consists in decomposition of 
the whole obtained sample to all possible subsamples and construction of regression 
estimates for each of these subsamples. The decomposition for which the minimum of 
the general sum of regression residuals is attained, is assumed to be the estimate of 
a true decomposition of the whole samples of obtained observations into subsamples 
with different regression regimes. 

These methods turn out to be rather time consuming and have a low power. For 
example, if there are only two regression regimes in an obtained sample but we do not 
know this fact and are obliged to try all possible subsamples up to the order 20, then 
many false structural changes will be obtained. 
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In this paper we propose a new method of detection and estimation of multiple 
change-points which is not based upon LSE of regression parameters and computa- 
tion of corresponding residuals. This method is more effective and robust to possible 
inaccuracies in specification of regression models. 

Let us explain the idea of this method by the following example of a multiple 
regression model (1) with deterministic predictors and the row-matrix n('i9, n). In other 
words, let i) = {61,62, ■■■ , 6k), A; > 1 is an unknown vector of change-point parameters 
such that = 6'o</3<^i<---<^fc<a< 6k+i = 1, where, as before, /3, a are 
known numbers, and the observations has the form 

2/„ = n*(i9,n)F(n/7V) + z/„. (11) 

Here 

fc+i 

U{^,n) = J2 a^m-iN] <n< [6,N]), 

i=l 

where 7^ ai+i,z = l,2,...,k are unknown vectors, F{t) is a given vector-function 
(all assumptions and notations see in Subsection 4.1.1). 

Consider our main statistic (7). The mathematical expectation of this statistic 
converges as — )■ 00 to the function 

t 1 



In the situation when there is no change-points, i.e., the vector of regression coef- 
ficients is constant on [0, 1], the vector function m(t) equals to zero for each t G [0, 1]. 
This property of m{t) makes it possible to effectively reject the null hypothesis about 
the absence of change-points when they are really present in an obtained sample. 

Consider the following method of detection and estimation of multiple change- 
points. Fix a small parameter e, min(/3, 1 — a) > e > 0. The proposed method consists 
of the following steps: 

1 . Compute statistic (7) by the data in the diapason of arguments A/" '= ( [PN] , . . . , [aN] ) . 
If max ||^jv('^) P > C, where C = C{N) is the decision threshold, then compute 
nmax = argmax||2jv('^)||^; otherwise the sample is assumed to be stationary (without 
change-points) . 

2. Put A^' = nmax — [^N] and compute statistic (7) by the data in the diapason of 
arguments A/"' '= {[PN], . . . , N'^ according to step 1. This cycle is repeated until: 
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1) we obtain a stationary sub-sample in the diapason of data with arguments 
{[I3N],...,N'), i.e. max||Z^/(n)f < C(iV'). Then we put n(l) = N' + [eN] as 

neJ\f' 

the estimate of the first change-point and go to step 3. 
or 

2) we obtain a sample of the size A^' < [2eA^]. Then we put n(l) = A^' -|- [eA^] as 
the estimate of the first change-point and go to step 3. 

3. Put n = 77,(1) + [eN] and compute statistic (7) by the data in the diapason of 
arguments (n', . . . , [aA^]) (i.e. with the relative arguments [1, . . . , [aA^] — n + 1]) and 
do according to steps 1 and 2. The cycle is repeated until we obtain a stationary sub- 
sample in the diapason of data with arguments [n , . . . , nmax] or nmax — n < [2eA^]. 
Then we put n{2) = nmax as the estimate of the next change-point. If A^ — n(2) < 
[2eN] then stop, otherwise repeat step 3 by the data in the diapason of arguments 
(n(2),...,[aA^]). 

In this way we continue to compute the estimates n(3), ... of change-points. As 
a result we obtain the series of estimates n{l) , n{2) , . . . of the true change-points 
[6iN], . . . ,[6kN]. The number /cat of these estimates is determined by the quantity 
of stationary sub-samples 

[1, . . . ,n(l)], . . . , [nit), . . . ,n(z + 1)], . . . , [n(A;jv), • • • , A^] 

The proposed method is based upon reduction to the case of only one change-point 
and the properties of the matrix m(t). The crucial point of this method is the choice 
of the decision threshold C{N) which depends on the sample size A^. Below we give 
an explicit formula for computation of C{N). 

Let /cjv be the estimate of the number of change-points in the sample and "^n = 
{Oni, . . . ,6j^j^^)* be the vector of estimated coordinates of change-point parameters. 
The following theorem holds for model (11). 

Theorem 5. Suppose assumptions of Theorem 3 are satisfied. Moreover, assume that 
there exist h > 0, B > such that for all i = 2, . . . , k + 1: 

< \\A{9,.,,e,)A~\9,.2,9^-l)\\ < h 
P(^,_i,^,)(a,-a,_i)|| >B>0, 

Then for sufficiently small 6 > 0: 

P{{kN ^k)U {{kN = k)n (max {Om - Oi\ > 6)}} < C{6) exp{-D{6)N), 

l<i<k 
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where constants C{6) > 0,D{6) > do not depend on N. 

Analogous theorem can be proved also for stochastic predictors. 

From theorem 5 it follows that the estimated number of change-points converges 
almost surely to its unknown true value, as well as estimated coordinates of unknown 
change-points converge exponentially to their true values as the sample size tends to 
infinity. Moreover, comparing results of theorem 2 and theorem 5 we conclude that 
the proposed method of detection and estimation of multiple change-points is asymp- 
totically optimal by the order of convergence of estimated change-point parameters to 
their true values. 

The proof of theorem 5 is given in the Appendix E. 

4.3 A variant of the limit distribution theorem for the decision 
statistic under the null hypothesis 

For practical applications of the proposed method and, in particular, for the rational 
choice of the decision threshold C{N), we need to study the limit distribution of the 
decision statistic under the null hypothesis. 

Let us formulate a variant of the limit theorem for the simple case of unique 
change-point, deterministic predictors, statistically independent noises z/„, and the one- 
dimensional dependent variable 

Suppose there exists a continuous function g{t),0 < t < 1 such that Eg z/^ = 
g\n/N). 

Put 

t 

aj = \ I jf{s)g\s)ds,^ = l,...,K 



G{t) = {a,{t),...,a^{t)y, Zit) = G{t)W{t), f/(t) = Z(t) - A(t)/-iZ(l), 

where W{t) is the standard Wiener process, A{t), I are the above defined matrices (see 
Subsection 4.1.1). 

Consider our main statistic, the vector process Z,^{t) = Zff{[Nt]) (see (7)). Then for 
any 9 G a], the vector process \/N (Zj^(t) — 'EQ Zj^(t)) weakly converges to the vector 
process U{t) in the Skorokhod space [/?,«] (see Brodsky, Darkhovsky (2000)). In 
particular, under the null hypothesis, the weak convergence is valid at [0, 1]. 

Therefore, we have the following 
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Theorem 6. 



lim Po{ViVmax \\Zj,{t)\\ > C} = Po{max \\U{t)\\ > C} (12) 
N^oo te[o,i] te[o,i] 

(here we use the Euclidean norm for vectors). 

The vector U{t) is Gaussian with zero mean and the foUowing K x K correlation 
matrix D[t): 

D{t) = t [G{t)G*{t) - G{t)G*{l)r^A{t) - A{t)G{l)G*{t)\+A{t)r^G{l)G*{l)r^A{t). 
Therefore, we have the foUowing equahty by distribution 



U{t) = ^D{t)C (13) 



where C, = {Ci-, ■ ■ ■ ■, Ck)* is the standard Gaussian vector. 
Taking (13) into account, we get 



max ||t/(t)|| = max 

0<t<l 0<t<l 



K 

def 



i=l 



where d^{t) are eigenvalues of the matrix D{t). The function p{C) can be explicitly 
calculated for any given family of functions F{t),g{t). 
Therefore, from (14) we have 

Po{max||f/(t)|| >C}= J ^{u)du, (15) 

{u:p{u)>C} 

where '^{u) is the density of the standard Gaussian distribution. 

From (12) and (15) we can conclude that type 1 error goes to zero as exp(— const NG"^) 
for the proposed method. This fact allows us to choose the decision threshold. Note 
that the same asymptotical order can be obtained from corollary 2 (see Subsection 
4.1.1). For independent noises we have 

4^V%(C) J 




Po{^ max ||2^iv(n)|| > G} < k / 1.100^2 \ 



(the notations see in Subsection 4.1.1) 
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Therefore, we conclude that type 1 error a^v goes to zero exponentially as — oo 
for the proposed method. 

So, the threshold can be calculated from the relation 



where A is a certain calibration parameter which depends on variations of predictors, 
dispersions of noises and characteristics of their statistical dependence. 

A more close study allows us to obtain the following practical formula for the 
decision threshold C = C{N): 



where af is the dispersion of Ui and A > is the calibration parameter. 

5 Experiments 

In this section we present results of a simulation study of the proposed method in 
comparison with other well known tests. The following methods are most often used 
for detection of structural changes in regression models: 

- The Chow test most often used in econometric packages; 

- The CUSUM (cumulative sums) test based upon recursive regression residuals 
(Brown, Durbin, Evans, 1975); 

- The CUSUM test based upon residuals of ordinary least squares method (OLS 
CUSUM test, Ploberger, Kramer, 1992); 

- Fluctuation test (Ploberger, Kramer, Kontrus, 1989) 

- Wald test (Andrews, 1993, Andrews, Ploberger, 1994) 

- LM TecT (Lagrange Multilpier test, Andrews, 1993). 

However, it is well known (see, e.g., Maddala and Kim (1998)) that the Wald test 
(together with the QMLE - quasi-maximum likelihood estimation test) is the best and 
most often used for detection of changes in regression models because it has the best 
characteristics of power and accuracy of change-point estimation. 

The Wald test statistic is defined as follows: 



C = C{N) 



1 



lna„| A 





SupW 



max 

l<m<Af 



S{N) - Si{m) - S2{N -m) 
Si{m) + S2iN - m) 
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where S{N) is the sum of regression residuals constructed by the whole sample of 
the size A^; Si{m) is the sum of regression residuals constructed by the sub-sample of 
the first m observations; 5*2 (A^ — m) is the sum of residuals of the regression model 
constructed by the last N — m observations. 

It is natural to define the estimate of the change point as rio ^ cl^O sup W, and the 
corresponding estimate of the change-point parameter 6^ = Uq/N . 

Comparison of characteristics of different methods is carried out in the following 
way. First, methods are 'equalized' by the value of type 1 error by means of choice of 
the corresponding decision thresholds. In practice, for this purpose we use experiments 
with stationary samples (without structural changes) in which the 95-percent quantiles 
of the variation series of the decision statistics are computed (see below, table 1). 
Second, for the chosen sample sizes and decision thresholds, experiments with non- 
stationary samples are performed in which we compute estimates of the type 2 error 
probability and instants of change-points (see tables 2 and 4). The method of change- 
point detection 'a' is preferable w.r.t. the method "b" if for the same values of the 
type 1 error, it gives lower estimates of the type 2 error and the error of change-point 
estimation. 

5.1 Deterministic regression plan 

We compared characteristics of our method with those of the Wald test using the 
following regression model with deterministic predictors: 

yi = CQ + CiXi + ii, i = l,...,N (16) 

where (xi, . . . , x^r)* is the vector of deterministic predictors; {^j} is the Gaussian noise 
sequence with zero mean and unit variance; cq, Ci are regresson coefficients which change 
at the instant uq = [9 N], Q < 9 < I. 

The number of independent trials of each experiment was equal to k=2000. The 
estimates of decision thresholds were obtained as follows. For each stationary sample, 
the 95-percent and 99-percent quantiles of the variation series of maximums of the 
decision statistic were computed in 2000 trials. These quantiles were then assumed to be 
estimates of the decision thresholds for 5-percent and 1-percent error level, respectively. 

The values of the threshold C given in table 1, were used as decision bounds for 
the confidence probability 95 percent in experiments with non-stationary regression 
models. The following cases were considered: 
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- before the change-point: Cq = 0, Ci = 1 

- after the change-point: Cq = 6, Ci = 1. 

In experiments the parameter 6 and the sample size TV were changed. The following 
characteristics of the proposed method were estimated: 

- The empirical estimate of decision threshold C (more exactly, the empirical esti- 
mate of max ||2iv('^)il); 

n 

- The empirical estimate of type 2 error probability Wn; 

- The empirical estimate of the change-point parameter 6^- 
Results obtained for the Wald test are given in the following tables. 

Table 1. Estimation of the decision thresholds for the Wald test for 
different sample sizes 



N 


100 


200 


300 


400 


500 


700 


1000 


1200 


p = 0.95 


10.10 


8.09 


9.59 


8.66 


8.12 


7.62 


7.51 


7.43 


p = 0.99 


12.60 


10.88 


14.14 


12.10 


12.20 


9.97 


11.68 


10.02 



Table 2. Estimation of the change-point parameter 6 = 0.30 by the Wald 
test 





300 


400 


500 


700 


1000 


6 = 0.3 


C 


5.63 


6.76 


8.24 


9.77 


12.09 


Wn 


0.83 


0.71 


0.59 


0.46 


0.32 


On 


0.29 


0.25 


0.22 


0.19 


0.20 


6 = 0.4 


C 


9.65 


10.20 


11.88 


15.27 


19.32 


Wn 


0.56 


0.47 


0.34 


0.23 


0.18 


On 


0.28 


0.25 


0.22 


0.20 


0.23 



The same model was studied with the help of the method proposed in this paper. 
1) Decision thresholds 

In the first series of experiments, model (16) with constant coefficients cq = 0, Ci = 1 
was used. The following results were obtained. 

Table 3. Estimation of the decision thresholds 





100 


200 


300 


400 


500 


700 


1000 


1200 


p = 0.95 


0.401 


0.257 


0.202 


0.182 


0.150 


0.125 


0.103 


0.081 


p = 0.99 


0.450 


0.300 


0.247 


0.211 


0.187 


0.162 


0.138 


0.102 



2) The estimates of the change-point parameter 
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Table 4. Results of estimation of the change-point parameter 6 = 0.30 



N 


300 


400 


500 


700 


1000 


6 = 0.3 


C 


0.179 


0.177 


0.168 


0.157 


0.151 


Wn 


0.64 


0.55 


0.33 


0.13 


0.03 


On 


0.340 


0.322 


0.332 


0.324 


0.307 


6 = 0.4 


C 


0.220 


0.211 


0.208 


0.195 


0.192 


Wn 


0.28 


0.24 


0.11 


0.02 


0.005 


On 


0.315 


0.312 


0.308 


0.305 


0.304 


Table 5. Results of estimation of the change-p 


N 


300 


400 


500 


700 


1000 


6 = 0.3 


C 


0.194 


0.184 


0.175 


0.168 


0.164 


Wn 


0.62 


0.50 


0.25 


0.05 


0.01 


On 


0.456 


0.485 


0.501 


0.502 


0.499 


6 = 0.4 


C 


0.231 


0.221 


0.215 


0.214 


0.211 


Wn 


0.26 


0.22 


0.003 


0.02 





On 


0.495 


0.495 


0.489 


0.501 


0.499 



parameter = 0.50 



Comparing results from tables 2 and 4, we conclude that type 2 error estimates 
for our method are lower than for the Wald test, and the error of estimation for our 
method is much lower than for the Wald test. Therefore, we conclude that our method 
is essentially better by the main performance characteristics of change-point detection 
than the Wald test, and so, we conclude that the proposed method is one of the most 
effective among all known tests for detection and estimation of structural changes in 
regression models. 

Comparing results from table 4 and 5, we can conclude that the quality of esti- 
mation of the change-point parameter depends on its location on the segment [0, 1]: 
estimation of which is closer to the bounds of the segment [0, 1] is more difficult. 

In next two subsections we investigate our methods. 
5.2 Stochastic regression plan 

In this series of experiments the following model of observations was used: 

Hi = co + ciXi + ^i, i = l,...,N 
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where {xi, . . . , x^)* is a stationary random sequence of the following type: 

Xi = pXi_i +r]i, i = l,...,N, xo^O, 

{^i; Vi} is the sequence of independent Gaussian r.v.'s with zero mean and unit disper- 
sion; Co, Ci are regression coefficients which change at the instant no = [6 N], < 6 < 1; 
\p\<l. 

1) Estimation of decision thresholds 

In the first series of tests decision thresholds were estimated. For this purpose, 
stationary sequences (without change-points) were used: cq = 0, Ci = l,p = 0.3. The 
following results were obtained. 

Table 6. Estimation of decision thresholds (the case of stochastic predic- 
tors) 





100 


200 


300 


400 


500 


700 


1000 


1200 


p = 0.95 


0.355 


0.291 


0.230 


0.188 


0.150 


0.132 


0.103 


0.082 


p = 0.99 


0.401 


0.332 


0.273 


0.218 


0.192 


0.171 


0.141 


0.100 



2) Estimation of the change-point parameter 

In the following series of experiments a model with a structural change in the 
regression coefficients was used: 

- before the change-point: Cq = 0, Ci = 1 

- after the change-point: Cq = 0, Ci = 1.3. 
Results obtained are presented in table 7. 

Table 7. Estimation of change-point parameters (the case of stochastic 
predictors) 



N 


500 


700 


1000 


1200 


e = 0.5 


c 


0.167 


0.157 


0.152 


0.152 


Wn 


0.32 


0.21 


0.02 





9n 


0.481 


0.495 


0.498 


0.499 


e = 0.3 


C 


0.156 


0.148 


0.142 


0.140 


Wn 


0.45 


0.30 


0.03 





6n 


0.312 


0.310 


0.308 


0.301 
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5.3 Multiple structural changes in multivariate systems 

The following multivariate system was used: 

yi = Cq + CiUi^i + C2Zi^i + C-iXi + Ei 

Zi = do + diHi + d2Xi + 
Xi = 0.5xi_i + z/j 
ei = 0.3ei_i + r]i, 

where i^i, ?7i, 2 = 1,2,... are independent standard Gaussian random variables. 

Here {yi, Zi)* is the vector of endogenous variables, Xi is the vector of exogenous 
variables, {yi_i, Zi_i,Xi)* - the vector of pre-determined variables of the considered 
system. 

Dynamics of this system is characterized by the following vector of coefficients: 
u = [cq Ci C2 C3 do di ^2]- The initial vector of coefficients is [0.1 0.5 0.3 0.7 0.2 0.4 0.6]. 
The first structural change occurs at the instant 9i = 0.3. The vector of coefficients 
u changes into [0.1 0.5 0.7 0.2 0.4 0.6]. The second structural change occurs at the 
instant 6*2 = 0.7. Then the vector u changes into [0.1 0.5 0.7 0.2 0.4 0.9]. 

In the first series of tests the decision threshold C was estimated. For this purpose, 
the model with the initial vector of coefficients u and without change-points was used. 
In 2000 independent trials the maximums of the decision statistic were computed and 
the variation series of these maximum was constructed. Then the 95-percent and the 
99-percent quantiles of this series were computed. These values are presented in table 
8. 

Table 8. Estimation of decision thresholds (the case of a multivariate 
system) 



N 


200 


400 


500 


700 


900 


1000 


1200 


1500 


p = 0.95 


0.28 


0.20 


0.19 


0.18 


0.16 


0.15 


0.145 


0.14 


p = 0.99 


0.36 


0.33 


0.28 


0.24 


0.23 


0.21 


0.19 


0.17 



The computed 95-percent quantiles were assumed to be the decision thresholds for 
the corresponding sample volumes. 

In the next series of tests non-stationary samples with multiple change-points were 
used. The true number of change-points was equal to p = 2, the coordinates of these 
change-points were 61 = 0.3 and 62 = 0.7. In table 9 the following performance 
characteristics are given: 
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- w; is the estimate of the probabihty Pe{pjv 7^ p} in 2000 independent trials, where 
PscriptscriptstyieN IS the estimate of the number of change-points in the data. 

- A is the estimation error on condition that = p, i.e. A = \Jy1^=i i^i ~ ^iY- 
Table 9. Estimation of change-point parameters (the case of a multivari- 
ate system) 



N 


200 


400 


500 


700 


900 


1000 


1200 


1500 


w 


0.96 


0.54 


0.39 


0.21 


0.04 


0.03 


0.02 


0.01 


A 


0.02 


0.05 


0.04 


0.02 


0.03 


0.02 


0.01 


0.005 



6 Conclusions 

In this paper the following main results were obtained: 

1. The general statement of the retrospective change-point detection and estimation 
problem in multivariate linear systems is given (both one change-point and multiple 
change-point problems, both independent and dependent sequences of observations) 

2. The prior lower bounds are proved for the main performance characteristic in 
retrospective change-point detection and estimation: the probability of the error of 
change-point estimation, in different contexts of change-point estimation: from one 
change-point in multi-factor linear regressions with deterministic and stochastic re- 
gression plans, to multiple change-point problems in multivariate linear models. 

3. A new method is proposed for the problem of retrospective change-point detec- 
tion and estimation in multivariate linear systems. The main performance characteris- 
tics of this method: type 1 and type 2 errors, the error of change-point estimation, are 
studied theoretically. We prove that the proposed method is asymptotically optimal by 
the order of convergence of the change-point estimate to its true value as the sample 
size tends to infinity. 

4. For the problem of multiple change-point detection and estimation, we propose 
a general setup in which both the number of change-points and their coordinates in 
the sample are unknown. For this problem statement, a new method is proposed 
which gives consistent estimates of an unknown number of change-points and their 
coordinates. This method is also asymptotically optimal by the order of convergence 
of these estimates to true change-point parameters. 

5. A simulation study of characteristics of the proposed method for finite sample 
sizes is performed. The main goals of this study are as follows: to compare performance 



27 



characteristics of the proposed method with characteristics of other well known methods 
of change-point detection in linear regression models: the Wald test, the Chow test, 
the CUSUM tests with ordinary and recursive regression residuals, the fluctuation test; 
to consider more general multivariate linear models and performance characteristics of 
the proposed method in these multivariate models. The main conclusion: performance 
characteristics of the proposed method are no worse but often even better than those 
of well known change-point tests. 
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Appendix. Proofs of theorems 
A Proof of Theorem 1 

Using notations (3)- (4), put 

^(A) = {T(A) : T(A) = {T^(A)}~ J 

This is the set of all sequences of the elements Tjv(A) G A^jv(A). Consider also the 
collection of all consistent estimates of the parameter 6' G A, i.e., 

M{A) = {T(A) G A^(A) : lim Pe(|T^(A) - ^| > e) = 0, G A, Ve > 0} 

A''— >-oo 

Under the assumption of Theorem 1, the set Ai{[a,b]) is non-empty for any < a < 

b < 1. Indeed, consider the sequence y„ = In — ; — -. Due to the assumption, 

fi{Xn,n/N) 

^eUn > 6 > before the change-point 6, a < 6 < b, and less than (—5) after the 
change-point. Now, using the same idea as in Brodsky and Darkhovsky (2000), it is 
easy to construct the consistent estimate of the change-point. 

Further, without loss of generality we can consider only consistent estimates of the 
change-point parameter 6, because for non-consistent estimates the probability of the 
error of estimation does not converge to zero and the considered inequality is satisfied 
trivially. 

Let 6* AT be some consistent estimate of the change-point parameter 6 constructed by 
the sample = {xi, . . . , Xm}- Consider the random variable = X^^xi, . . . , x^) = 

i{\eM-e\>e}. 
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Under the change-point parameter 6*, the hkehhood function for the sample can 
be written as follows: 

[eN] N 
f{X'',e) = l[fo{x,,z/N)- H Mx,,z/N). 

i=l i=[eN]+l 

We have for any d > and < e < e': 

PeiK -0\>e} = EeX^ > Ee(AI(/(X^, 9 + e)/f{X^, 9) < e^)) > 

> e-'^ (E,+,,(A^I(/(X^, 9 + e/)//(X^, 9) < e'^}) > 

e-^ {Pe+e,{\9N -9\>e}- P,+,{/(X^, 9 + e/)//(X^, 9) > e'^}) 

(here we used the elementary inequality P{AB) > P{A) — P{Q\B)). 

Consider the probabilities in the right-hand side of the last inequality. Since 9j\f is 
a consistent estimate of 9, we have P0^^,{\9i^ — ^| > e} — > 1 as — j- oo. For estimation 
of the second probability, we take into account that 

li9+el)N] 

ln(/(X^,^ + e/)//(X^,^)) = Yl ln(/o(x„^/iV)//i(x„ViV)) 

i=[eN]+i 

Therefore, 

Ee+.M{f{X^,9 + ef)/f{X^,9)) = 
= nJ Eoln^MMrft + o(l). 

Then 

Pe+e'{/(x^e+6')//(x^^)>en = 

r l{e+e')N] 

= Pe+e' { E (ln(/o(x„^/iV)//i(x„2/iV)) - Eq In (/o(a;„ 2/iV)//i(x„ z/iV)) 
\^i=ieN]+i 

>d-Ny Eoln^M^rft + o(l) 

d+e' fo(x,t) 

Put d = di(N) = N( f Eoln ' dt + S) for some S > and use the law of large 
numbers which holds due to existence of Eq In ' . Then we obtain 

Pe+AfiX"", 9 + e')//(^^, 0) > e'^^^^} ^ 
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as — oo. 



fi{x,t) 



The same considerations for d = d2{N) = N{ J Ei In dt + 5) yield 

e-e' Jo{x,t) 

as — )■ oo. 
Therefore, 



Pe{\ON - ^1 > e} > (1 - o(l)) max(e 



-di(Ar) g-<i2(Af)^ 



It follows from here 



liminf Mn inf P6i{|^Ar - 6'| > e} > - min 



v 



Jo(t)rft, / Ji{t)dt 



I 



Note that the left-hand side of this inequality does not depend on the parameters 
5, e', and the right-hand side exists for each 5 > 0, 6' A (1 — 6') > e' > e > 0. From the 
continuity assumption for the functions Jo(')) conclude that our result follows 

after taking the limits of both sides of this inequality as 5 — and e' —f e. 



B Proof of Theorem 2 

We will use notations (3)-(4) and (6). Let x G M^,!/ G M^, h m = max(p, g). Define 
the following natural immersions: 

im^ : MP ^ M™, x = im^, x, im^ : M'' ^ M™, y = im^ y 

(all lacking components are substituted by zeros) and put: 

dist{x,y) = \\x — y\\^'^'^ 



(here we use the 
Consider 



-norm for vector x = (xi, . . . ,Xp), i.e., HxH'^p^ = max 

i<j<p 



liminf^^ooiV-'ln inf sup {P^^n & V^AWn - ^^''^ > e) 

+P4^N ^ Vk)} 
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Note that for e < 5, any estimate ^ A^Ar(^*)5 and any -d G V^, the following 
relationships between events hold: 

(disti^N,^) >e) = {i9n e Vk, - n^""^ > e) U {^n ^ Vk, dist{^N,^) > e) = 

= i'&N e W'dr, - > e) U i'&N ^ Vk) . 

Here we used the fact that from the definition of dist and the condition {^n ^ Vk) 
it follows that {dist{i)N,^) > 6), and this condition yields dist{'dN,'d) > e) for e < 6. 

Thus, we need to estimate the probability P^(^dist{'d]\r,'d) > e). 

First, note that the set Ai(T>k) of all consistent estimates of the parameter i) G Dk 
is non-empty. This fact follows from assumption ii) of the Theorem 2 and the same 
considerations as in proof of Theorem 1 . 

Second, remark that the infimum in (B.l) can be taken only on the set A^7v(^fc)- 
In fact, let G A^jv(P*) belongs to arginf of the left-hand side of this inequality, i.e., 

inf sup P^ldisti^-d^,!!}) > e} 

= sup P^{(i^s^('^?^, -i?) > e} 

(without loss of generality we suppose that the infimum is attainable). Then consider 
the following element of the set A^7v(^fc): 

where Fat is the element of the set M.N{T^k) such that 

sup P^idistiVN.'d) < e/2} >l- K 

for some fixed k > 0. Such elements exist in A^jv(^/t) (for large enough A^), because 
this set contains consistent estimates. 

By definition, -^n G M.Ni'^k) and for each G Vk, 

P^{dist0N, e} = P.s{dist{^*^, t?) > e} + P^{c?zsi(F^, ^) > e}. 

Therefore, 

sup P^{dist{^jM,-&) > e} < sup PiJ{^^^s^('^9^, > e} 
+ sup P^{dist{TN,^) > e} 

= inf sup P^{dist{{}N,{}) > e} + sup P^{dist{r]\f,'d) > e} 

< inf sup P^{dist{'dN,'^) > + 1^ 
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So, 



K+ inf sup P^{dist{'dN,'d) > e} > 

> inf sup P^{dist{i9pf,'&) > e} > 

inf sup 'P^{dist{'dM,'&) > 

and this is the fact we wanted to show. 

By the definition of dist, we have on the set Jiij\f{Vk): 

dlsti^N,^) = II^^AT-^^II^'^). 

Further, for any i = 1, . . . , k the following inclusion holds 

where 6i{N) is the i-th component of the vector ■i^Tv- 
Therefore, 

P4II^^ - n^'^ >e,^Ne Vk} > max P4|^.(iV) - 6^] > e,^N e V^}. 

l<i<k 

But estimation of the value 

liminfiV-Mn inf sup Pi){\ei{N) - > e,^N V^} = A, 

is exactly the problem already considered in the proof of Theorem 1 for the 
unique change-point. Therefore, 

j T~\t)dt, j J,{t)dt J . 

So, finally we obtain 

liminf^v^oo A^"^ln inf sup P^{||^?7V - ^f^^ > e,^N ^ V^} > 

( 9i+e 9i 

> - min min / J'-^{t)dt, J Ji{t)dt 
^-^-^ \ e, e,~e 

This completes the proof. 



case 
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C Proof of Theorem 3 

1 

Due to the assumptions, the matrix I = J F{t)F*{t)dt is positive definite. Therefore, 



there exists the matrix [A^ ("^i^) ] for all > No(F). The constant Nq(F) can be 
exactly estimated for any given family of functions F{t). 

Let us consider the matrix random process with continuous time Zi^{t) =^ Zpf{[Nt]), t G 
[0,1]. 

It is easy to see that the mathematical expectation of the process Zf^{t) can be 
written as follows: 

(m 

^eZ^{t) = iV-i E F(z/iV)F*(z/iV)n*(e,z) 

TV \ 

_pm^pNYi j2 F{i/N)F*{i/N)U*{9,i) . 

i=l J 

After simple transformations we obtain that m{t) '= lim YigZi^it) has the form: 

I AM/-.(/-AW)(a-b).. t<e 

\ [I - A{t))I'^A{e){ai-h)\ t>e, 

Consider the square of the Gilbert norm of the matrix m(t), i.e., the function 
f{t) = tr(m*(t)m(t)), and show that the function f{t) has a unique global maximum 
on the segment [0, 1] at the point t = 9. 

First, for each t < 9: 

f{9) - f{t) = i,{B*{A\9) - A\t))B), 

where matrix B was defined in Theorem 3. Consider the matrix 

A\9) - A\t) = A{9){A{9) - A{t)) + {A{9) - A{t))A{t). 

Denote L = A{9){A{9) — A(t)) and prove that the matrix L is positive definite as 
t < 9. In fact, since the matrix A{9) is symmetric and positive definite, we can write 

x*Lx = x*A^/\9)A^/\9){A{9) - A{t))x = y* A^/\9){A{9) - A{t))A~^/\9) y, 

where y = A^^'^(9)x. 

The matrices A{9) - A{t) and A^/'^{9){A{9) - A{t))A-^/'^{9) have identical charac- 
teristic polynomial and eigenvalues. Besides, A{9) — A{t) is positive definite as t < 6*. 
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Therefore, the matrix A^/^{9){A{9) - A(t))A-'^/'^{9) is also positive definite as t < 9 
and therefore, the matrix L is positive definite. 

In analogy, the matrix {A{9) — A{t))A(t) is positive definite as t < 6'. Therefore, 
the matrix A'^{9) — A'^(t) is positive definite as t < ^. 

Now consider the matrix D = B (A"^ (9) — A"^ (t)) B* . The matrix D is positive definite 
if rank(i?) = M, but this is our assumption. 

So, we obtain ti{B{A'^{9) - A^{t))B*) > foi t < 9 and therefore, the function f{t) 
has a unique global maximum on the segment [0, 9] at the point t = 9. 

The same considerations for t < 9 yield that f{t) monotonically decreases on the 
segment [9, 1]. As a result, we obtain that f{t) has a unique global maximum on the 
segment [0, 1] at the point t = 9. 

Further, we are going to show the following: there exists a positive constant c such 
that f{9) — f{t) > c ■ \9 — t\. This estimate can be obtained as follows. Taking into 
account the continuity of the functions fj{t), we obtain 

e 

A{9)-A{t) = j F{T)F*{T)dT= {9-t)U{t,9) >0, (C.l) 
t 

where the matrix U{t, 9) is positive definite for < t < 6* and negative definite for 
t > 9. Due to the continuity, we can write 

U{t,9) = U{9,9) + K{t,9), {C.2) 

where K,{t, 9) ^ as t ^ 9. 
Then 

f{9)-f{t)=tT{B*{A'{9)-A'{t))B) = 

= tr {BB*A{9){A{9) - A{t))) + tr {BB*{A{9) - A{t))A{t)) = (C.S) 

= i9-t) tr ((a - b)*(a - h)V{t, 9)) , 

where V{t, 9) = {E - A{9)I~^) {A{9)U{t, 9) + U{t, 9)A{t)) {E - I-^A{9)). 
Taking into account (C.l) and (C.2), we have 

V{t, 9) = {E- A{9)r^) {A{9)U{t, 9) + U{t, 9)A{t)) {E - r^A{9)) = 

+ {E- A{9)r') {A{9)K{t, 9) + K{t, 9)A{9)) {E - r'A{9)) + 
+{t -9){E- A{9)r^) U{t, 9)U{t, 9) {E - r^A{9)) . 
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Denote 

G{e) = {E- A{e)r^) {A{e)u{e, e) + u{e, e)A{e)) {e - r^A{e)) 

R{t, e) = {E- A{e)r^) {A{e)K{t, e) + nit, e)A{e)) {e - i-^A{e)) (c.s) 

H{t, e) = {E- A{e)r^) u{t, e)u{t, e) {e - r^A{e)) 

and put 

f G(9), e>t , , 

G(e) = 1 ^ ^' C.6 

1 -G{e), e<t. 

Then from (C.3), (C.4), (C.5) and (C.6) we get 

fie) - fit) = \9-t\ tr ((a - b)*(a - b)G(^)) + 

+ ie - t) tr ((a - b)*(a - b)i?(t, 6)) - (C.7) 

-ie~ t)2 tr ((a - b)*(a - b)i7(t, 6)) 

Since 6') — t- as t — ^ and Hit, 9) is positive definite, we conclude that 

fie) - fit) >\e-t\ tr ((a - b)*(a - b)G(e)) + o(|t - ^1), 
i.e., there exists a positive definite matrix WiO) such that 

\^{t) f = fiO) - fit) >\e-t\ tr((a - b)*(a - h)Wie)) 



for some neighborhood of 6. Therefore, we have got the estimate of sharpness of the 
maximum for the function fit): 

fie) - fit) >\e- t\XAr [(a - b)*(a - b)] , (C.S) 

where 

def . tr[(a-b)*(a-b)iy(^)] 
Af = mm 



p<e<a tr[(a- b)*(a- b) 
Let us describe how to calculate Xp. For given family of functions F(t) we can 
calculate the function /(t) = tr [m*(t)m(t)] . Then it is possible to calculate 

, . fjo) - fit) 

Xp = mm ; — p- — 

f5<t<a, i3<e<a |6'-t|tr[(a-b)*(a-b)J 

Due to the condition 0<P<d<a< 1, we get > (see (C.5)). Note that from 
(C.8) and definition of fit) we have for any t G a]: 

\\mie)r - \\mit)r > ^^^1^ -t\\\^- bf iC.9) 
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The process Zj^{t) can be decomposed into deterministic and stochastic terms: 

=m(t)+7^(t)+r/^(t), (C.IO) 

where the norm of the deterministic function 7]v(t) converges to zero with the rate 
Lp/N) (this term estimates the difference between corresponding integral sum and the 
integral; the constant Lp depends of the function family F{t) and can be estimated 
explicitly for any given family), and the stochastic term is equal to 

([Nt] N 

The norm of the process ?7jv(t) can be estimated as follows: 

sup Wv^m <r\VK+ \\I\\ ■ + ^(||J|| + + L^/N) 

/3<t<a L 

X I max max max A^^-^l V fjij lN)viA | =^ ^n^^\ 

\l<i<Kl<l<M[fiN]<n<N \j^^J^yJI ) I l<---llj 



n 



= Ti\ max max max ^1 V fiii/N)^^] 

\l<i<K 1<1<M [i3N]<n<N j^i 

where TZ = TZ{F, N). Here we used the following relations 

max WN^'^vl^'^ - Mt)\\ < max < ||/| 



\\N{v^y' - r'\\\ < 



N 



and took into account that for any matrix M we have the relation || Af || = ^tr(M*M) < 
i?max \ mij\, where constant R depends only of the dimensionality. 

Denote 5„ = £ f^U/N)ui„ = f,U/N)ui^ and 
put cr^ = sup sup sup 'Eig{fi{n / N)vinY . Choose the number e(x) from the following 

i l<n<N 1<1<M 

condition 



ln(l + e(x)) 



x'^/Ag, x<gT, 
xT/A, X > gT, 

where the constant T is taken from the uniform Cramer condition and g > a^. 

For the chosen e(x) = e, we choose the number mo(x) > 1 from the uniform ijj- 
mixing condition such that ip^m) < e for m > mo(x). 

Decompose the sum S'„ into groups of weakly dependent terms: 

C _ cl I c2 I . . I qmoix) 
On — On I 0„ -|- -|- 0„ , 
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where 

' n — i 



SI, = e(0 + ^{t + mo{x)) + + mo{x)[- 

\ 'fno[x) 

and i = 1, 2, . . . , mo(x). 

The number of summands k{i) in each group is no less than [n/mo(x)] and no more 

than [n/mo{x)] + 1. The '0-mixing coefficient between summands within each group is 

no larger than e. Therefore, 

mo{x) 

Pe{\Sn\/n>x}< E Pe{\SUn\>x/mo{x)}< 

i=i ^ {C.12) 
< mo{x) max PedS^I > {k{i) — l)x}. 

l<i<mo{x) 

From Chebyshev's inequality we have: 

Pe = Y.J(' + ^0-^') ^ ^1 ^ e-*"Eee*^S Vt > 0. (CIS) 

Further, from ■^/'-mixing condition it follows that (see Ibragimov, Linnik (1971)): 

Ege*^^ < (1 + e)'^ Be exp{t^{i))Eg exp{ti{i + mo)) ... Eg exp{t^{i + rriok)). (C.U) 

Consider the term Eig exp{t^{i)) . From the uniform Cramer's condition it follows 
that for each < t < T: 

Eee*«"(^) < exp(t2^/2). 
Then from (C.13) and (C.14) we obtain 

Pe{Si > x} < (1 + e)'=exp {kgty2 - tx) . 

Taking the minimum of kgt'^/2 — tx w.r.t. t, write 



From the definition of e we obtain 



1 + e)'^ exp(-xV2A;^), x < kgT, 
1 + ef exp(-a;T/2), x > kgT. 



, , , , exp(— A;x^/4q), x < qT, 
'Pe{\Sl/k\ >x}<{ ' - ^ ' (CIS) 

exp(— /cxT/4), X > gT. 

Now, using (C.12) and (C.15), we obtain 

r, , , . 1 mr)(x) exp (—x'^n/AqmJx)) , x < oT, 
Pe{\Sn/n\>x} < { ^ ' / y ov _y , ^^^^^^ 

mo(x) exp (— Txn/4mo(x)) , x > gT. 
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From (C.ll) and (C.16) we get 

exp(-(e/7^)2iV/3/4^7mo(e/7^)) 
e < TZgT 

exp(-T(e/7^)iV/3/4mo(e/7^)), 
I e > TZgT, 

In particular, for the case of independent observations, mo(e) = 1. 
From the definition of the estimate Oj^ and (C.9) we can write 



P4 sup h^{t)\\ >e} <mo(e/7^) { 

fS<t<a 



(C.IT) 



Pe i \0n -0\> e, Or, e Arg max \\Zr,{t)\ 

' li<t<a 

Oil > ll^ivWIUe l^^-^l >e} 



= P4ll^iv(^ _ 

<P4hiv(^iv)||-||^ivW||>||m^^^l|2 



<Ve{ sup \\TiN{t)\\ > 

.I3<t<a 



< Pe< sup \\r]^{t)\\ > 

ll3<t<a 

where Ai = max ||m(^||. 

(3<e<a 

Denote C(e, iV) = 



4M0||"''(='-'''"'^-'^)'-A'J 
£A^t.((a-b).(a-b))-i£; 



< 



|a b|| 



. Then, finally we obtain from (C.18): 



exp 



v 



N(3(C{e,N)/n 



Agmo {C{e,N)/n) 



I 



sup Ve{\e^-e\> e} <m^{C[e,N)lll){ 

I3<e<a 



if C(e, A^) < TZgT 

TNl3(c{e,N)/n) 



exp 



4mo (C(e,A^)/7^) 

t ifC{e,N)>'}ZgT. 

Remark 6. In case of only one regression relationship and independent noises Vi, we 
obtain from here 



exp 



F 

AM 



sup Ve{\e^-e\ >e}<< 

(S<e<a 



tfC{e,N)<ngT 



exp 



TN(5e 

An 



AM 



if C{e,N)>ngT. 
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Theorem 3 is proved. 

Corollary 2 can be obtained (as it follows from the proof) from the estimates of 
Pe{ sup \\r]N{t)\\ > e}, 9 = or 9 y^O. 

I3<t<a 

D Proof of Theorem 4 

The proof is based on the same ideas as in Section C, and so we give the sketch of the 
proof. 

Let us consider the matrix random process with continuous time Zjv(t) '= Zj^{[Nt]), t G 
[0,1]. 

It is easy to see that the mathematical expectation of the process Zjv(t) can be 
written as follows: 

([Nt] N 
V{n/N)W{9, n) - Tr\r,'')-' J] V{n/N)W{9, n) 
n=l n=l 

Denote M{t) == lim E0Zjv(t). After simple transformation we have 

{ M(t)M~i (M - M(e)) (a - b)*, t < 9 
M{t) = I {D.l) 
y (M - M(t)) M-^M(^)(a - b)% t > 9 

It can be shown from (D.l) (by the analogous arguments as in Section C) that the 
function $(t) =^ ||M(t)||^ = tr (M(t)M*(t)) has unique global maximum on the seg- 
ment [0, 1] at the point t = 9 and there exists Xy > such that the following inequality 
holds 

<l>(^) - <^{t) > Xy\9 - t\tT [(a - b)(a - b)*] {D.2) 

for any P < t < a. The constant Ay depends only of V(t) and can be estimated 

analogously the constant A^- from Section C. 

Consider matrix sequence N~^T^ . Due to the assumptions, this sequence Pg-a.s. 

1 

tends to the positive definite matrix M = / V{s)ds, and the rate of the convergence is 



exponential. Therefore, there exists number A''! = A''i ({X(n)}) such that as > A''i 
we get 

P4II^^"'V - M > e} < L{e) exp {-K{e)N) , (D.3) 
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where functions L{e), K{e) can be exactly estimated (taking into account ^/'-mixing 
condition and Cramer's condition) by the scheme of Section C. The number A'^i can be 
estimated by the random sequence {X(n)}. 
Process Zjv(t) can be written as follows 

= M(t) + r^(t) + G(t), 

where r^(t) = EeZ^(t) - M{t) and = Z^(t) - EeZ^(t). 

Note that max ||riv(t)|| < ^ (because this is the difference between the sum and 



0<t<l 



the integral), and constant Ly can be estimated exactly for any given function V{t). 
Fix e, < e < min {{a — /3), ||]R||/2) and consider the events 



= {\\N-%^ < ||M||/2, 
max ||iV-ir'^*l -M(t)|| < e, ||Ar(7;^)-i - R-i|| < e}, 

0<t<l 

Dj, = n\Dj,. 

Note that matrix N^^T^ is non-degenerate on the set Dj^. Then, due to (D.3), 

6^{e) = Ve{D^) < 3L(e) exp {-K{e)N) . 
Further, analogously (C.ll), we can write on the set 



{DA) 



sup llCiv 
I3<t<a 



< R 



V^+||R||-||M-i+e(||M|| 



-^11+^) 



X I max max max -"^1 V 

l<i<_ft: 1</<A/ [/3Ar]<r!.<Af j^i 



def 



P-5) 



R max max max ^1 XijVu\ 

^ l<i<K 1<1<M [l3N]<n<N JZi 



where R = R(V",e). 

Now we can use (C.17) and get (by the analogous reasons) from (D.5) on the set 



Ve{ sup ||G(t)|| > e, 1{D^} < mo(e/R) { 

I3<t<a 



exp{-{e/RfNP/4gmo{e/K)) 
e < RgT 

exp(-r(e/R)iV/3/4mo(e/R)), 



(D.6) 
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Using (D.4), (D.6), and the analogous considerations as in (C.18), we get 



sup Pe{\e^-9\ >e}< 5^{e) + 
P<e<a 



^ A^/3fc(e,iV)/R^^^ 



mo (C(e,iV)/R)<^ 



exp 



exp 



Agmo (C(e,iV)/R; 
TiV/3fc(e,iV)/R 



4mo (C(e,iV)/R) 



if C(e, N) < RgT 



if C(e, N) > RgT, 



where C(e, A^) 



II 1-112 
a — b 

L4M" " N 



Theorem 4 is proved. 



max ||M(t)||. 

p<t<a 



E Proof of Theorem 5 

The proposed method of multiple change-point detection and estimation is based upon 
the idea of recurrent reduction to the case of one change-point. 

In order to prove theorem 5 we need to prove the following two propositions: 

i) in the case of a stationary sub-sample the norm of the decision statistic does 
not exceed the threshold with the great probability. This fact is exactly the result of 
Corollary 2; 

ii) in the case of a non-stationary sub-sample with at least two change-points, the 
norm of the decision statistic exceeds the decision threshold with the great probability. 

In order to illustrate ii), let us consider a sub-sample of size with two change- 
points < 6i < 02 < 1. 

In this case the decision statistic can be decomposed into a deterministic and a 
stochastic term (see (CIO)). 

We have from (CO) for < t < ^i: 

m{t) = Ait)ai - A{t)A-^{l) (A(^i)ai + A{9,, ^2)02 + ^(^2, 1)03) 

(E.l) 

= A{t) {a,-A-\l)u), 

where u = A(6'i)ai + A(6'i, 6^2)02 + ^(6^2, 1)03. 
Again using (CO), we get for 61 < t < 62: 

m{t) = A{9i)ai + A{9i,t)a2 - A{t)A~^{l)u = 
= A{e,){ai - A-\l)u) + A{e,,t){a2 - ^-^(l)^). 
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If 



then max ||m(t)|| > A > 0. 

/3<t<a 

Otherwise, let ||m(6'i)|| < A. Then 

||m(^2)|| > P(^i,^2)(a2 - A~\l)iu)\\ - A = 
= \\A{e,, ^2)(a2 -ai + ai- A~\l)u\\ - A 
> \\Ai9^,e2)ia2-a^)\\ - \\A{9^,92)ia, - A-^il){u)\\ - A 
>B- e2)A-\e{)\\ A- A>B - A{l + h)> A. 

Therefore, taking into account (E.l), we get: there exists A > such that 

max ||m(t)|| > A {E.2) 

I3<t<a 

From (E.2) it follows that we get ii) with the great probability. 

After these preliminary considerations, let us consider the probability of the event: 

{kN ^ k)U {{kN = k)n (max \9m - Oi\ > 6) {E.3) 

l<i<k 

for some fixed S, e > S > 0. Let us consider the following cases: 

a) {kiy < k}, h) {kjq > k}, c) {(kjy = k) H (max \9m — > ^)}- 

l<i<fc 

Case a) 

In this case the proposed method does not detect at least one change-point, i.e., 
a certain sub-sample of size > [26N] containing at least one true change-point, is 
classified as stationary. Then 

Pi){kN <k}< P^{ max \\Zj^{t)\\ < C{N)} {EA) 

P<t<a 

where C{N) is the decision threshold for the sub-sample. 

Choose C{N) < A. Then due to (E.4) and (C.IO) we have 

P4 max IIZ^II < CiN)} < P4 max ||r/^(t)|| > max ||m(t)|| - ^ - C{N)} 

p<t<a P<:t<a P^t<a 

< P4 max Wvrni > A - ^ - CiN)} 

p<.c<.a 

Now we can use (C.17), changing e by {A — — C{N)}, and get the exponential 
estimate for the event {kj^ < k}. 
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Case b) 

In this case there exists a stationary sub-sample of the size N > [SN] such that it 
is classified as non-stationary. Then 

Po{k >k}< Po{ max \\Z^{t)\\ > C{N)} {E.5) 

f)<t<a 

But the exponential estimate of the right-hand side (E.5) can be taken from (9). 
Case c) 

In this case there exists a sub-sample of the size A^* > [26N] such that the distance 
between a true change-point parameter 6i and its estimate 6^% is larger than 6. This is 
exactly the case of Theorem 3, and we get the exponential estimate of this event from 
(8). 

Therefore, we get the exponential estimate for the event (E.3). This completes the 
proof of Theorem 5. 
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